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Abstract 

The inflated beta regression model aims to enable the modeling of responses in the intervals (0,1], [0,1) or [0,1]. 
In this model, hypothesis testing is often performed based on the likelihood ratio statistic. The critical values are 
obtained from asymptotic approximations, which may lead to distortions of size in small samples. In this sense, 
this paper proposes the bootstrap Bartlett correction to the statistic of likelihood ratio in the inflated beta regression 
model. The proposed adjustment only requires a simple Monte Carlo simulation. Through extensive Monte Carlo 
simulations the finite sample performance (size and power) of the proposed corrected test is compared to the usual 
likelihood ratio test and the Skovgaard adjustment already proposed in the literature. The numerical results evidence 
that inference based on the proposed correction is much more reliable than that based on the usual likelihood ratio 
statistics and the Skovgaard adjustment. At the end of the work, an application to real data is also presented. 
Keywords: bootstrap Bartlett correction, improvements in small samples, inflated beta regression, likelihood ratio 
test. 


1 Introduction 


The beta regression model proposed by 


Ferrari and Cribari-Netd (l2004h is appropriate when the dependent variable 


assumes values in the standard unit interval (0,1), such as rates, proportions or indexes. It is assume that the response 
follows a beta law with constant precision parameter and mean paramet er modeled by a regression st ructure. This 


regression structure is similar to the generalized linear model (GLM) dMcCullagh and Neldei 


19891). The mean 


response is related to a linear predic tor through a link function and the linear predicto r inv olves known covari ates 


and unknown regression parameters dOsnina et al 


2006 


Bayer and Cribari-Netc , 


2013). In 


Parker et al 


20141) the 


authors present a discussion about the origins of beta regression models. 

In rates and proportions data, zeros and/or ones values can often be observed. For example, when the mortality 
rate for a given disease, child labor rate, proportion of hospital a dmissions for certain cause, am ong other situations, 
are to be evaluated. In such cases the seminal model proposed in iFerrari and Cribari-Netd d2004t) is not suitable. The 
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log-likelihood function of the beta regression model becomes n on-limited, and it’s not pos sible to assume that data 


come from an absolutely continuous distribution. For these cases, 


Ospina and Ferraril ( l2012h propose the inflated beta 


regression model, based on mixture of beta and bernoulli degenerate at zero and/or one distributions. It is important 
to mention that a degenerate di stribution is the probab ility distribution of a discrete random variable that assumes 


probability 1, to a single point jSundarapandiar , 


200S). These inflated distributions allow users to model data that 


Ospina and Ferrari. 


20101) . In this work it will be addressed the model of 


assume values in (0,1], [0,1) or [0,1] 
inflated beta regression in zero or one. 

The probability density function of the inflated beta distribution at zero or one has three parameters: conditional 
mean (^,), precision {(j),) and the mixture parameter (a,). The latter determines the probability that the dependent 
variable is equal to one of the limits of the unit interval. In the inflated beta regression model, each one of these 
parameteres is assumed to be variable along the observations, being modeled using regression structures that involve 
link functions, covariates and unknown parameters. The presence of regression structures for the three parameters 
that index the inflated beta density makes the problem of inferences in small samples more severe, given the large 
number of parameters to be estimated. 

The estimation of the inflated beta regression model’s parameters is based on maximum likelihood estimation 
(MLE), in which the inferential procedures are similar to GLM. After the point estimation, another important aspect 
in the modeling are the hypothesis testing on the pa rameters of the mode l . One of the usual test statistics to per¬ 


form hypothesis testing is the likelihood ratio (LR) jNevman and Pearson , 


19281) . This is an approximate test and 


is characterized by the use of critical values from approximations that are valid in large samples. However, these 
asymptotic approximations can be poor in small samples, resulting in considerable distortion of the probability of 
type I error (size) of the tests. Inferential improvements in small samples may be achieved by analytical or numeri- 
cal/computation al adjustment s. T wo important works on hypotheses testing and finite corrections to asymptotic tests 


are, respectively. 


Busel(ll982h and 


Cribari-Neto and Cordeird ( 119961) . 


Several studies have been developed to improve the performance of the likelihood ratio test in small samples. 


Among the proposals for inferencial improvement stands out the Bartlett correction ( iBartlett . 


19371), in which its 


an alytical derivation inv o lves c umulants and mixed cumulants up to fourth order of the log-likelihood function. 


In 


Cvsneiros and Ferraril (l2006h . the Bartlett correction is presented in non-linear models of the exponential fam- 


ily. For improveme nts of the heterosce dasticity test in the normal linear regression model, 
this correction. In 


Ferrari et al 


1 20041) use 


Also, in 


Melo et al 


1 2009b), the Bartlett correction is derived from the class of linear mixed models. 


Bayer and Cribari-Netd (l2013h . the Bartlett correction in the beta regression model with constant disper- 


sion is considered. However, the derivation of the Bartlett correction can be costly, or even impossible to obtain 


I Ferrari and Pinheiro. 


2011 


Bayer and Cribari-Neto . 


20131), especially when the parameters are not orthogonal, as in 


the inflated beta regression model. 


Another alternative is the Skovgaard adjustment jSkoveaard , 


200 ih . Some recent papers consider this adjust- 
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ment were developed in the class of no nlinear models of the exponential family dFerrari and Cvsneiros . 


a new class of mode l s for proportions dMelo et al 


200®, in 


Ferrari and Pinheiro. 


2009®, in the beta reg ression model with variable disp ersion 


201 ih and for the model of inflated beta regression dPereira and Cribari-Netc , 


2014h). De¬ 


spite the Skovgaard adjustment being less analytical costly than the Bartlett correction, it still requires second-order 
derivatives of the log-likelihood function, being that a limitation primarily to inferential improvements in applied 
works. 

With the same objective of the Skovgaard and Bartlett adjustments, which is to improve the approximation of 
the chi-squared distribution to the exact null distribution of the likelihood ratio statistic in small samples, it can 


be considered the bootstrap Bartlett correction dRocke , 


1989h. In this second-order correction, the Bartlett correction 


factor dLawlev 


1956h is determined by the bootstrap method dEfrori 


1979h . The bootstrap Bartlett correction becomes 


a good numerical alternative to analytical determination of the Bartlett correction factor, requiring only the use of a 
simple Monte Carlo simulation. The bootstrap Bartlett correction still has computational advantages over the usual 
bootstrap procedure for the determination of exact quantiles for the null distribution of the test statistic. While the 
usual bootstrap method requires a large number of resamples (usually above 1000), the numerical Bartlett correction 


requires a smaller number of bootstrap iterations (around 200 resamples) dBaver and Cribari-Neto , 


2013h . Despite 


extensive advantages in using the bootstrap Bartlett correction versus other analytical and numerical approaches, this 
approach is r arely explored in the literature. One of the few studies that consider the bootstrap Bartlett correction was 


Bayer and Cribari-Netd d2013h . evidencing similar results between the analytical and bootstrap Bartlett 


developed by 
corrections. 

In order to improve the inferences in small samples in the inflated beta regression model, this work proposes the 
bootstrap Bartlett correction to the likelihood ratio s tatistic. The performance in smal l samples of the proposed test 


statistic is compared with the Skovgaard adjustment dPereira and Cribari-Neto . 


2014bh and the usual likelihood ratio 


statistics, via Monte Carlo simulations. The approximations of statistics’ distributions by chi-squared distribution in 
samples of finite size are evaluated, and the influences of these approximations on the performance of hypothesis 
testing are verified, in terms of size and power of the tests. 

This paper is organized as following. Section introduces the inflated beta regression model at zero or one, 
as well as link functions, log-likelihood function and inferential details. In Section the likelihood ratio test for 
the inflated beta regression model, the proposed bootstrap Bartlett correction and Skovgaard adjustment for small 
samples are presented. Section|4]describes the experiment of Monte Carlo simulation for finite samples and presents 
the numerical results and its discussion. In Section|5] an application to real data is presented and discussed. Finally, 
Sectionl^presents the conclusions. 
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2 Zero-or-one inflated beta regression model 


The beta regression model proposed in 


Ferrari and Cribari-Netol OOOdI) is based on a reparametrization of the beta 


density, indexed by parameters of mean fl and precision (j). The parameter (j) is considered constant and /i is modeled 
by a regression structure. The beta density is given as follows: 






r(M(/))r((i-/i)0) 


0<y<l, 


( 1 ) 


where 0</i<l, 0>O and r(-) is the gamma function, i.e. r(u) = t" *e 'dt. Thus, if y is a random variable 
with density given by Equation Q, we have: 


E(y) =^i, 

Var(y) =^{l - + (j)). 


For the inflated beta regression model a distribution for the dependent variable in which its density involves three 
parameters is assumed. Let yi,...,y„ independent random variables, in which y,, t = I,... ,n, have inflat ed beta 


distribution at the point c (c = 0 or c = 1), for which the density is given by dPereira and Cribari-Neto , 


2014bh: 




( 2 ) 


in which is an indicator function that assumes value \ if yt = c and 0 otherwise, 0 < 0 !/ < 1 is the mixture 

parameter of the distribution specified by at = Pr(yf = c), (c = 0 or c = 1), 0 < /tj < 1 is the mean of y, conditional 
on yt G (0,1), 0, > 0 is the precision parameter and f{yt',flt,(j)t) is the beta density function given in Equation ([T}. 
If c = 1, the function given in Equation © is the density of a random variable with inflated beta distribution at one, 
y ~ BEOI(o!,/t, (/>). On the other hand, if c = 0, we have an inflated beta distribution at zero, y ~ B EZI(o!,tt,</>). Eory 


with inflate d beta distribution in c, where c = 0 or c = 1, expectancy and variance yt are given by dOspina and Ferrari . 


201C, 


2013): 


E(y,) =a,c+(l-a,)ii„ 

Var(y,) = (1 - a,)li,(l - fii)/{(j)t + 1) + a,(l - a,){c-fl,f. 


T hus, in the zero-or-one inflated beta regression model wit h varying dispersion, we have the following rela¬ 


tions dOspina and Ferrari , 


2012 


Pereira and Cribari-Neto. 


2014bh: 


g{H,) = Y^Xitpi = ri,, 

i=l 
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^( 0 /) = = K,, 

1=1 

M 

Hat) = Y,^it7i = Q, 

1=1 

with t = in which jS = (Pi,... ,Pm)~^, ■^ = (■^l, • ■ - ^ ~ (7l) • ■ ■ )7 a/)^ are vectors with unknown 

parameters, where j3 G R”', A G R^ and y G xi,,... ^x^t, s\t,...,Spt and zj,,..., ZMt represent the fixed and known 

covariates (m + p+iW < n), g(-), fc(-) and h{-) are strictly monotonic and twice diffe rentiable link functions, such that 




, b ; (0,c 


. and h ; (0,1) 


Pereira and Cribari-Neto. 


2014ai . Different link functions can be 


used: the logit, g{p.) = log[p/(l — ft)]; the probit, g(p) = * (p), in which <!>(■) is the normal distribution function; 

the complementary log-log, g(fl) = log[—log(l — p)]; the log-log, g(/r) = log[—log(/t)]; and the Cauchy, g{jl) = 


the sq 


uare root, b{(j)) = i/0. For details on link functions : 


2009h. 


McCullagh and Nelde: 

0^ 

CO 

o^ 

) and 

Koenker and Yoon 


To obtain the maximum likelihood estimators of the parametric vector 6 = (j3^,A^,7^)^ is necessary to maxi¬ 
mize the logarithm of the likelihood function. The l og-likelihood function for 6 = (j3^,A^,7^)^ can be written in 


the following way dPereira and Cribari-Neto , 


2014bh: 




.*\T 




(3) 


in which y- = y* = (y^,... ,y*)T, yt = (y|,...,yJ)T ^ ^ (p*,... ,p*)T, 

H"' = {ill, a = {ai,,an)~'^, b = (bi,,bn)~'^, a, = log{l - at) + a* andZ7,=logr((/),)-logr(p,^,)-- 
logr((l-p,)^,)-f (p,^,-l)p,* + (0,-2)p/. Moreover a* = diag{a*,..., a*}, = diag{/ii,... ,pn}, H = 

diag{ 1 — yj,..., 1 — y^} and $ = diagj^i,..., 0n} are diagonal matrices nx n, ^ is the identity matrix n x n and l 
is the column vector n-dimensional of 1 s, where ce,* = log (a, / (1 — a,)), 




1, yt = c, 

0, yrG(0,l), 


yt = 


iog(T^)> yfe(0,i), 


yt = c, 


and vj = 


log(l-y,), y, g(0, 1), 
0, y, = c. 


matrix, see 


Osnina and Ferrari 

12012 

) and 

Pereira and Cribari-Neto 

(2014b) 


likelihood estimators do not have closed form, being necessary the use of iterative numerical methods for maximizing 


the log-likelihood function, such as Newton method or quasi-Newton methods such as BFGS jPress et al 


1993). 


The inflated beta regression model is part of the class of generalized additive models for location, scale and 
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shape (GAMLSS) dRigbv and Stasinopoulos 


20051) . Thus, adjustments of inflated be ta regression models consid 


er ed in this work are made using the 


R dR Development Core Team . 


gamlss package dStasinopoulos and Rigbv 


20071) available in the environment 


which is a generalization of the a lgorithm used by 


20141) . The log-like lihood maximizations we r e carrie d out using the RS algorithm, 

Jb 


sion additive models (MADAM) dStasinopoulos et al 


Rfgb y_and StasinopoulosI dl996al lbl) for fitting mean and disper- 


20081). This algorithm is well suited for situations in which 


the parameters are orthogonal, and it does not require accurate starting values for the parameters to achieve conver- 
gence (the default starting values, often constants, are usually adequate) and handles large data sets quite efficiently 


Stasinopoulos et al 


2008h. 


3 Likelihood ratio test and small sample corrections 

Let y\,...,yn be independent random variables and assume that each yt, t = 1,... ,n, has density function given by 
H. Additionally, let 6 = (/3^, be the vector of unknown parameters that index the inflated beta regression 

model at zero or one. Consider the parameters vector 6 = (v^, wherein v = (Vi,..., Vq)^ is the vector of 

parameters of interest and t = (fi,..., is the vector of nuisance parameters, where m + p + M = q + s. Suppose 
the interest is in testing the null hypothesis : v = Vq, where Vq is a specified vector of constants of size q. The 
likelihood ratio statistic is given by: 


LR = 2 


[m-m], 


where £(6) is the log-likelihood function given in Equation l[3, evaluated at 6 = (v^, t"'')''', 6 = (v"'',is the 
unrestricted MLE of 6, 6 = (vq^, is the restricted MLE of 6 (under the null hypothesis). 

Under usu al regularity conditi ons an d under the LR statis tic has a pproximately a distribution Xg with error 


of order n * dCasella and Bergei 


2002 


Pereira and Cribari-Netc , 


2014b 


Bayer and Cribari-Neto . 


20131), where q 


is the number of parameters tested in the null hypothesis. However, in samples of finite size these approximations 
can be poor, resulting in size distortions. In this context, analytical or numerical/computational adjustments may be 
considered for inferential improvements in small samples. Eollowing the bootstrap Bartlett correction proposed in 
this paper for the likelihood ratio statistic i n the inflated beta regression mod el is presented, as well as the Skovgaard 


adjustment for inflated beta model given in 


Pereira and Cribari-Netd ( l2014bl) . 
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3.1 Bootstrap Bartlett correction 

In order to improve the performance of the likelihood ratio test in small samples, in 


Bartletd (Il937h is introduced the 


Bartlett correction, later generalized by 


Lawlevllll956h . The Bartlett correction is given by: 


_ LR 

LI^Bartlett — i 
C 


where c = E{'LK)/q is known as the Bartlett correction factor. The determination of c using Lawley’s lll9.56h no 


tation involves the pro duct of cumulants and mixed cumulants up to fourth order that are not invariant by permu¬ 


tation dCordeiro . 


19931). In beta regression models the analytical obtaining of c can be costly or even 


especially for the non orthogonality of parameters dFerrari and Pinheiro 


2011 


Bayer and Cribari-Netc , 


impos sible. 


20131). For 


the inflated beta regression model with variable dispersion, considered in this work, the analytical derivation of the 

Bartlett correction becomes practically intractable. _ 

As an numerical alternative to analytical derivation of the Bartlett correction. iRockd d 19891) i ntroduces the boot¬ 


strap Bartlett correction, where the correction factor c is determined via the bootstrap method dEfron , 


197911 . The 


bootstrap Bartlett correction becomes a viable alternative to inferential improvements in small samples when there 
are impeditive or too costly analytical difficulties, as in the model considered here. 

The bootstrap Bartlett correction considering the expected value of LR, directly estimated from the observed 
sample y = (yi,... ,y«)^ using bootstrap, can be described by the following steps: 

1. Generate, under B bootstrap resamples (y**,... ,y*^) of the model, replacing the model parameters by the 
estimates in using the original sample (parametric bootstrap). 

2. Obtain the bootstrap LR statistic for each pseudosample y**, with b= calculated in the following way: 

LR** = 2{f(0**;y**) - f (e**;y**)}, 


in which 6** is the MLE of 6 under the alternative hypothesis J^i, e 0** is the MLE under 

3. Calculate the corrected LR statistic, given by: 


LRb = 


LRq 

lP’ 


(4) 


1 B 


in which LR* = - £ LR**. 

fo=l 

In the bootstrap Bartlett correction the LR statistic is corrected so its distribution i n small samples can be 


better approximated by the reference null distribution, Xo dBaver and Cribari-Neto 


20131). Meanwhile, the usual 


bootstrap correction consists of obtaining a bootstrap approximation for the null distribution of the test statistic 


I Cribari-Neto and Oueiroz . 


2014) . 


Rocka ( 119891) states that the bootstrap Bartlett correction has computational ad- 
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vantages compared to the usual bootstrap scheme, and with B = 100, in general, there are results ec 


uivalent to the 


usual bootstrap method with B = 700. Also, through simulation studies, 


Bayer and Cribari-Netd ( 120 13h conclude that 


B values larger than 200 lead to negligible improvements for bootstrap Bartlett correction. In this sense, the boot¬ 
strap Bartlett correction has good computational advantages over the usual bootstrap method for hypothesis testing 
correction. 


3.2 Skovgaard adjustment 


Another possible correction of the likelihood ratio statistic is the Sko vgaard’s adjustment, originally presented in 


SkovgaardI ( Il996h and subsequently generalized in 


considerable simpler than the Bartlett correction dPereira and Cribari-Neto , 


Skovgaari (2001 ). Th i s adjus tment, obtained analytically, is 


2014a) . The Skovgaard’s adjustment 


only require first- and second- order log-likelihood cumulants and, different from the Bartlett correction, independent 
of the orthogonality of the parameters. 

Skovgaard’s ap proximation has been used i n different models. Among them, in the non-linear models o f 


exponential family dFetrari and Cvsneiros . 


2008h and in the extreme values models dFerrari and Pinheiro . 


2014). 


In the class of beta regression models we have the Skovgaard adjustment for beta regression model with vary- 


ing dispersion l FgrrarijmdPisheirg, 


I Pereira and Cribari-Netc. 


201 ih and in the inflated beta regression model with varying dispersion 


2014bh . The results of these studies indicate that the test based on the Skovgaard statistic 


performs better than the test based on the uncorrected L R statistic. 
The likelihood ratio statistic modified by Skovgaard 


SkovgaardI d2001h is given by: 


LRj,, =LR( l-:^logA 


in which 

? = |/|1/2|/| l/2|t|-l |/^^| 1/2| ’ 

where / is the expected information matrix, J is the observed information matrix, U is the total score function, 
t = Eg[t/(0)t/T(0)], r = Eg[U{e){i{e)-e{e))] and is the observed information matrix sx s corresponding to 
the vector t. Yet, “hat” denotes evaluation in the unrestricted MLE and “tilde” the evaluation in the restricted MLE. 
An asymptotically equivalent version to LR^^ti is given by: 


LRsA:, =LR-21og<^. 


Un der the null hypothesis, th e statistics LR^^ti have approximately the distribution Xq with high pre¬ 

cision dPereira and Cribari-Netol 12014bh . For details on the analytical derivation of the Skovgaard adjustment in 











































inflated beta regression model, see 


Pereira and Cribari-Netd d2014bh . 


4 Numerical results 


To evaluate the performance in small samples of the proposed statistic LRg, given in (0, the usual likelihood ratio 
statistic (LR) and the two versions of the Skovgaard adjustment (LR^j.^ and LR^j.^), a simulation study was performed. 
The number of Monte Carlo replications was 5000 and for the bootstrap Bartlett correction were considered B = 200 
bootstrap resample s. The sample sizes used were 30, 4 0, 50. The entire computational implementation was developed 


in the language R 


R Development Core Tearr , 


gamlss dStasinoDoulos and Rigby 


20141) . and for the estimation of the model parameters the package 


20071) was used. 


Table 1: Null rejection rates (%); submodels for /r, (j) and a 





1% 



5% 



10% 



Stat 30 

40 

50 

30 

40 

50 

30 

40 

50 

Submodel for jl 

1 

LR 

3.16 

2.20 

2.02 

9.84 

7.94 

7.26 

16.84 

13.62 

13.66 


LRb 

0.80 

0.90 

1.18 

4.80 

4.84 

5.34 

9.22 

9.52 

10.20 


LRsii 

1.10 

1.18 

1.48 

5.08 

5.26 

5.22 

9.98 

10.42 

10.50 


LRsr2 

0.76 

0.88 

1.22 

4.50 

4.94 

4.92 

8.90 

9.82 

10.18 

2 

LR 

3.22 

2.24 

2.10 

10.06 

8.24 

7.46 

17.22 

15.06 

13.18 


LRb 

0.80 

0.88 

1.12 

4.96 

4.56 

4.70 

9.50 

9.58 

9.56 


LRsii 

1.44 

1.36 

1.40 

6.10 

5.68 

5.50 

11.64 

11.42 

10.60 


LRsr-2 

1.36 

1.36 

1.40 

5.90 

5.60 

5.48 

11.20 

11.28 

10.54 

Submodel for (j) 

1 

LR 

2.54 

1.90 

1.34 

8.34 

7.56 

6.36 

14.80 

13.80 

11.72 


LRb 

0.46 

0.90 

0.68 

3.80 

4.58 

4.22 

8.04 

9.64 

8.84 


LRsii 

1.82 

1.58 

1.44 

7.00 

6.34 

5.94 

12.84 

11.88 

11.28 


LRsr-2 

1.32 

1.24 

1.08 

6.34 

5.88 

5.54 

12.00 

11.18 

10.74 

2 

LR 

2.62 

2.24 

1.92 

9.68 

8.78 

7.50 

16.86 

14.62 

13.66 


LRb 

1.00 

0.94 

1.10 

4.62 

5.38 

5.08 

9.74 

10.20 

10.42 


LRsii 

1.54 

1.22 

1.16 

6.74 

6.20 

5.58 

12.80 

11.24 

11.22 


LRsr2 

1.40 

1.20 

1.14 

6.30 

6.04 

5.56 

12.44 

11.00 

11.08 

Submodel for a 

1 

LR 

1.70 

1.70 

1.28 

6.50 

6.38 

5.64 

12.18 

11.88 

11.12 


LRb 

0.80 

1.06 

1.04 

4.42 

4.86 

4.72 

9.18 

9.12 

9.96 


LRsii 

0.84 

1.12 

1.06 

4.68 

5.10 

4.86 

9.76 

9.76 

10.00 


LRsi2 

0.82 

1.08 

1.04 

4.50 

4.98 

4.80 

9.46 

9.62 

9.92 

2 

LR 

1.96 

2.04 

1.90 

8.46 

7.80 

6.62 

14.22 

14.02 

11.70 


LRb 

0.62 

0.80 

1.16 

3.60 

4.80 

4.52 

8.06 

9.54 

8.90 


LRsii 

1.34 

0.96 

1.12 

4.80 

5.28 

4.78 

9.60 

10.46 

9.54 


LRsi2 

0.58 

0.76 

1.08 

3.90 

4.88 

4.68 

8.64 

10.04 

9.50 


All results for evaluating the null rejection rate (size) of the tests are shown in Table[T] considered the one-inflated 
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beta regression model. In this table the best results are highlighted. Nominal levels were considered equal to 1%, 
5% and 10%. In the evaluation of the tests on the parameters of the mean submodel, it was considered the following 
regression structure for the mean, precision and mixture parameters : 

^(M/) = Po + Pmt + p2X2t, 

Hcct) = 7o + rizi/, 

in which t = For the structure of mean regression, g{flt), and mixture, h{at), the logit link function was used 

and for the structure of precision parameter, b{(j>r), the logarithmic link function. 

In the Monte Carlo simulation, we consider two scenarios for the null hypothesis: (i) ^ = 1, in which : J 82 = 0, 
fixing the parameters po = —I, pi = 3.5, p 2 = 0, ?^ = 5.1, Xi = —2.8, yo = —2, yi = 1-5; and (ii) q = 2, : pi = 

P 2 = 0 , where jSo = 2, Pi = P 2 = 0 , with the same parameter values for (j) and a submodels considered for q = 1. 
These values for the parameters in (i) imply the averages of y and (j) to be equal, respectively, to 0.731 and 55.102, 
when n = 50. For (ii), the averages of y and ip are, respectively, equal to 0.908 and 55.102, with n = 50. The matrix 
of regressors is generated from a standard uniform distribution, (0,1), and kept constant during all Monte Carlo 
replications. For each replication, a sample yi,... ,,v« is generated with one-inflated beta distribution given by 

We also consider tests on the parameters of the submodel for precision (ip). In these cases we consider the one- 
inflated beta regression model given by: 


g{^t) = Po + PiXit, 
bi^r) = ^ + ^\Sit + ^2S2t, 
h{at) = yo + yizi,. 

To evaluate the null rejection rate of the tests, it was considered the following scenarios: (i) (jr = 1, Mq : A 2 = 0, fixing 
the parameters jSo = —1, jSi = 3.5, Aq = 5.1, Ai = —2.8, A 2 = 0, yo = —2, yi = 1.5; and (ii) q = 2, : Ai = A 2 = 0, 

considering Aq = 5.1, Ai = A 2 = 0. The average values of y and ip in this scenario are, respectively, equal to 0.728 
and 54.865, for (i) with n = 50. For (ii), with n = 50, the averages of y and (p are, respectively, equal to 0.728 and 
164.022. 

Further, to evaluate the null rejection rate of the tests to make inferences about the parameters of the a submodel, 
we considered the following regression structure: 

g(fli) = Po + Pixir, 
fc(0,) = Ao + Ai^if, 
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Table 2: Estimated quantiles and moments of the test statistics for the submodel for ii,q = 2 and n = 40 


Variate 

Mean 

Variance 

Skewness 

Kurtosis 

90th-perc 

95th-perc 

99th-perc 

Xq 

2.000 

4.000 

2.000 

9.000 

4.605 

5.991 

9.210 

LR 

2.400 

5.438 

1.828 

7.345 

5.526 

7.076 

11.015 

LRb 

1.963 

4.028 

1.831 

7.365 

4.558 

5.827 

9.000 

LR^i 

2.105 

4.456 

1.819 

7.262 

4.884 

6.236 

9.604 

LRs<:2 

2.089 

4.431 

1.818 

7.260 

4.859 

6.213 

9.592 


h{cct) = Yo + rizu+m2f 

In this case, were considered: (i) q = I, '■ Yi =0, fixing the parameters j5o = —I, fii = 3.5, Ao = 5.1, Aj = —2.8, 
Yo = —2, 7 i = 1.5; and (ii) q = 2, Mq : yi = 72 = 0, considering 70 = —2. These values for the parameters in (i) 
imply averages of y and (j) equal, respectively, to 0.728 and 55.001, when n = 50. For (ii), the averages of y and (j) 
are, respectively, equal to 0.688 and 55.001, with n = 50. 

Examining the Table[T] where are presented the results of tests’ size, considering the ^ submodel, it is found that 
the LR test is the most liberal, showing rejection rates well above nominal levels. For example, at the level of 5% and 
10% for n = 30 and q = 2, the rejection rates for LR are, respectively, 10.06% and 17.22%. The corrected statistics, 
both the bootstrap Bartlett correction as well as the two versions of Skovgaard adjustment, have less size distortion 
than the test considering the usual uncorrected statistical. When imposed only one restriction, i. e., q = 1, the LR^ 
showed good performance, but the LR^^-i statistic showed the best results for n = 30. For q = 2, the proposed LR^ 
statistic has the best performance in all sample sizes and significance levels. Still, among the corrected statistics, the 
more liberal is LR^j-i, i. e., it has in general higher rejection rate than the nominal level. For this liberal characteristic 
of LR^j-i, it is already expected that its results on the evaluation of tests’ power will be higher. 

For the results of tests’ size on the submodel parameters of (j) it can also be verified that the corrected statistics 
have better results. In particular, we highlight the performance of the proposed statistic LRg when imposed two 
restrictions on the null hypothesis. Also, it can be seen that the versions corrected by Skovgaard are more liberal. For 
example, at the level of 10% the null rejection rates of the LR^;^ are 12.80% (n = 30), 11.24% (n = 40) and 11.22% 
(n = 50). 

For inferences about the submodel parameters of a, as shown in Tabled) the best results are also shown by the 
corrected statistics. As expected, tests on the parameters that index the mixture parameter submodel have very similar 
results to results for inferences about the regression structures fx and (j). In general, the Skovgaard adjustments show 
better performance in this case, however, the LR^ statistic still has similar and much higher performance than the 
usual likelihood ratio. 

The objective of the second order corrections considered here is to improve the approximation of the LR test 
statistic distribution by the null chi-squared limit distribution. Table |2] presents quantiles and estimated moments 
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(a) n = 30. (b) n = 40. (c) n = 50. 

Figure 1: Quantile-Quantile graph for the submodel of ji, q = 2 and different sample sizes. 


of the considered statistics, as well as the reference values of Xq- The scenario testing the submodel parameters of 
fl, under two restrictions, q = 2, and with n = 40 was considered for these results. It is verified that the statistic 
distribution of LR is the farthest from reference chi-squared distribution. Among the four statistics considered, those 
having moments and quantiles closer to Xq is the proposed LR^. Still, it is observed that in general the corrected 
statistics present values of calculated measures closer to the reference values of Xq than the LR. 

Figure[T]shows the QQ-plot graphs (exact quantiles versus asymptotic quantiles) for different sample sizes, given 
the same scenario of the results of Table|2] It’s clear that the distribution of the proposed statistic is much closer to 
the reference null distribution, Xq- h was also observed that all the corrected statistics are closer to the reference null 
distribution of the usual LR statistic. 

Table presents the results of Monte Carlo simulations for non-null rejection rate (power) of the tests on the 
parameters of the submodels of fl, (j) and a. Since the results of simulations of the test size using the LR statistic are 
pretty liberal, we present only the results for LRg, LRjj-i and LR 5 ,t 2 - For the mean submodel, we tested M\ : (52 = 5 
(q = 1), where 5 = —1,-0.5,0.5,1. For the submodel of (j) we tested M\ ■. 2,2 = S (q = 1), where 5 = —4,—3,3,4. 
Also, about the regression structure of a, the tested hypotheses were : 72 = ^ (? = !)> where 5 = 1,2. 

Based on Tablej^it is noticed that the performances of the three statistics do not differ much for the three submod¬ 
els. The corrected statistic LRj^tl. in most scenarios, is slightly more powerful. However, this result was expected, for 
being the most liberal among the corrected statistics. Simulations of power under two constraints {q = 2) were also 
considered. However, the results for = 1 and q = 2 are similar and the results for g’ = 2 were omitted for briefness. 

Based on the results presented, it is verified the good performance of the bootstrap Bartlett statistic proposed 
here for inferences in small samples. LRg was shown to be equivalent or superior (in some cases) to the Skovgaard 
analytical adjustment. Whereas the adjusted tests behave more accurately and obtaining the proposed corrected 
statistic is simpler because it does not require expensive analytical calculations, we recommend using the test based 
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Table 3: Non-null rejection rates (%), for the submodels for /r, (j) and a, subject to a restriction (^ = 1) 




1 % 

5% 

10 % 

5 

Stat \ 

, « 30 

50 

30 

50 

30 

50 

Submodel for jl 

-1 

LRb 

84.92 

96.62 

96.68 

99.56 

98.40 

99.82 


LRsti 

87.06 

97.06 

97.02 

99.60 

98.48 

99.86 


LRsk 

85.62 

96.96 

96.48 

99.60 

98.36 

99.84 

-0.5 

LRb 

28.18 

38.36 

56.02 

64.20 

69.56 

75.68 


LRsiti 

30.36 

39.08 

57.84 

65.06 

70.88 

75.76 


LRs*:2 

28.48 

38.76 

56.12 

64.80 

69.48 

75.44 

0.5 

LRb 

23.80 

38.10 

50.56 

64.04 

65.04 

75.22 


LRsytl 

26.50 

39.32 

52.78 

64.56 

70.88 

75.78 


LRsi:2 

25.04 

38.92 

51.12 

64.22 

64.88 

75.56 

1 

LRb 

78.12 

92.90 

93.48 

98.28 

96.90 

99.32 


LRsti 

80.44 

93.44 

94.08 

98.56 

97.12 

99.38 


LRsJ:2 

79.18 

93.24 

93.54 

98.48 

96.88 

99.38 

Submodel for (j) 

-4 

LRb 

82.64 

97.70 

92.98 

99.50 

96.16 

99.76 


LRsti 

88.68 

98.18 

96.02 

99.58 

97.96 

99.84 


LRsk 

88.58 

98.18 

95.80 

99.58 

97.86 

99.84 

-3 

LRb 

56.06 

81.30 

77.10 

92.50 

84.34 

95.54 


LRsiti 

65.74 

83.24 

82.48 

93.08 

88.82 

96.00 


LRsJ:2 

65.44 

83.14 

82.24 

93.04 

88.62 

96.00 

3 

LRb 

46.02 

70.24 

70.40 

86.92 

80.70 

92.22 


RRski 

48.68 

72.00 

71.94 

87.94 

81.68 

92.94 


LRst:2 

47.64 

71.90 

70.48 

87.78 

80.76 

92.80 

4 

LRb 

72.58 

92.76 

88.44 

97.70 

92.82 

99.10 


LRsJ:l 

74.60 

93.74 

88.62 

98.00 

92.86 

99.10 


LRst2 

73.58 

93.64 

87.86 

98.00 

92.10 

99.10 

Submodel for a 

1 

LRb 

2.40 

5.84 

8.62 

17.28 

15.10 

26.33 


LRst:l 

2.58 

5.84 

8.64 

17.34 

15.66 

26.10 


LRsi:2 

2.42 

5.76 

8.42 

17.22 

15.48 

26.02 

2 

LRb 

8.72 

30.22 

23.54 

54.64 

35.16 

67.02 


LRsti 

8.72 

30.18 

23.10 

55.14 

35.28 

66.88 


LRsi:2 

8.58 

30.10 

22.78 

55.12 

35.04 

66.84 


in the bootstrap Bartlett statistic. 


5 An application 


This section presents an application to real data of the likelih ood ratio test corrected v i a boot strap Bartlett, proposed 
in Section 0 The data used are part of the work presented inj; 


Sampaio de Souza et al 


1 2005 ) which estimates levels 


of efficiency for the Brazilian municipalities. These indexes take values in the range (0,1], where 1 corresponds to 
the fully efficient municipalities. In this application were considered the 26 Brazilian state capitals, referent to the 
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2000 year. The proportion of ones in this data is equal to 0.12. 


The variables considered in the database were; number of inhabitants (xi), information (x 2 ), which is a binary 
variable that assumes a value of 1 if the municipality is computerized, and 0 otherwise, personnel expenses (X 3 ), 
population density (X4), percentage of households whose head earns up to 1 minimum wage (xg), urbanization rate 
(xg), index actualization of the real state register (X7), a binary variable that receives values 1 if the municipality is 
located in areas of the drought polygon area and 0 otherwise (xs) and averag e income (xg). Further details on these 


and other related variables can be accessed at 


Sampaio de Souza et al 


1 2005b . 


For the mean submodel, the initial model has been obtained by the function stepGAIC of the gamlss pack¬ 


age available at R jR Development Core Team . 


2014j)- This function selects a model by a stepwise algorithm us- 


ing the generalized Akaike info rmation criteria. For the submodels of ij) and a the same covariates presented in 


Pereira and Cribari-Netd (l2014bh were considered. Thus, initially we consider the following model 


log 


1 -Mf 


— PQ + PlXlt+ P2X21 + p 2 i^ 3 t + pAMt^ 


\og((j),) =Ao + AiX9,, 


log 



=Yo+ri^9f 


The tests were performed at the 10% nominal level. When testing the exclusion of the covariate X 4 , Mq : P 4 = 0, 
we have the values of the statistics and (p-value in parenthesis) given by: LR = 3.609 {p = 0.057) and LRg = 2.177 
{p = 0.140). It is noticed that inferential conclusions using the corrected and non-corrected statistics are opposite. 
By the corrected LRg statistic, the hypothesis Mq is not rejected, then we decided to exclude the covariate X 4 of 
the submodel. To test the significance of X 3 , ,^{4 : J 83 = 0, we have; LR = 5.909 (p = 0.015) and LR^ = 3.837 
{p = 0.050); both tests reject the null hypothesis, so X 3 remains in the submodel. When testing : P 2 = 0, it 
is obtained LR = 2.054 (p = 0.152) and LR^ = 1.509 (p = 0.219), the null hypothesis is not rejected, then we 
exclude the covariate X 2 of the submodel. Yet, for : pi =0, we have LR = 8.287 (p = 0.004) and LR^ = 6.229 
(p = 0.013), in which both reject the null hypothesis. Based on the test corrected via bootstrap Bartlett, the adjusted 
model is given by; 


log 


log 


— Po + Pl^li + p3X3n 


P-t 

^-Pt. 
log( 0 ,) =Ao + AiX 9 ,, 


a, 


1 — at 


= To+ 71^9/■ 


To evaluate the quality o f the fitted model, based on the corrected test, we consider the proposed residual analysis 


Ospina and Ferraril ( l2012h . Figure[2]presents the quantile randomized residual graph and the half-normal probabil- 
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(a) Residuals versus indexes. 


Figure 2: Randomized quantile residual plots. 

ity graph with simulated envelope. In Figure [2(a)] it is verified that all residual were within the range (—2,2). Yet, in 
Figure |2(b)| it can be seen that all the points are within the confidence bands of the simulated envelope, indicating a 
good fit of the model. 

To test whether the model is correctly specified, we consider the RESET test for the inflated beta model presented 


Pereira and Cribari-Netd (l2014ah . In this test we obtained p = 0.997, not rejecting the null hypothesis that the 


model is correctly specified. 

Therefore, it appears that the model selected based on hypothesis testing using the bootstrap Bartlett corrected 
test provides a good fit. 


6 Conclusions 

The likelihood ratio statistic is typically used to perform hypothesis testing in the inflated beta regression models. 
Flowever, if the sample is not large enough to guarantee a good agreement between the distribution of the test statistic 
and the limiting distribution, the approximate likelihood ratio test can be considerably oversized. In this paper 
we propose a bootstrap Bartlett correction of the likelihood ratio statistic for inferential improvements in the inflated 
beta regression model in small samples. Throu gh Monte Carlo simulations we e valuated the proposed correction 


and compared it with the Skovgaard adjustments llPereira and Cribari-Netc , 


2014bh and with the non-corrected usual 


statistic. The simulation results indicate that the corrected statistics make the tests more accurated, reducing the 
problem of size distortion in small samples. Still, it is verified that the proposed correction via bootstrap Bartlett 
has results very close to or even better than the analytical Skovgaard adjustments. The latter requires second-order 
derivatives of the log-likelihood of the model, while the proposed correction requires only the use of a simple Monte 
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Carlo simulation. We believe that the proposed bootstrap Bartlett correction can be quite useful in practical situations 
and we recommend to practitioners to model data using inflated beta regressions and use it since it is easy to obtain 
and present accurate inferential results. 
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